Intro to Application Programming Interfaces (APIs)

Kivan Polimis

April 04, 2022

Agenda

1. Introductions

My background

Personal interests

What do I do?

  • I try to break down problems into programmable solutions
    • Coding
  • Experienced teacher of analytics and analytic scaffolding
  • I enjoy playing/watching sports (soccer, basketball, football, and the 2017 World Series Champions, Houston Astros), traveling, movies, and reading.

2. What is an API?

Application Programming Interface (API)

“APIs are mechanisms that enable two software components to communicate with each other using a set of definitions and protocols. For example, the weather bureau’s software system contains daily weather data. The weather app on your phone ‘talks’ to this system via APIs and shows you daily weather updates on your phone.” - Amazon Web Services

In plain English, an API is a standard way for developers to communicate with software applications to request and send data. An API is composed of

  1. formal syntax that governs communication between software and
  2. the software itself

TL;DR

What is an API?

What is an API?

Why use an API (Business Case)

  1. APIs cut down on development time and costs

“We live in the world of API economy, where new software is built by leveraging many different commercial or open source software components,” says [Bhanu] Singh [VP of Engineering] of OpsRamp. “For example, Uber uses a variety of software systems for things like payment, location, maps, and traffic, all of which rely on APIs to communicate.”

Why use an API (Business Case)

  1. APIs reduce complexity

“Without APIs, developers would need to have intimate knowledge of the internal workings of an application to be able to extend its functionality,” says [Glenn] Sullivan [co-founder] at SnapRoute. “Instead, APIs give developers a way of collaborating to build more intricate systems of applications without having to work for the same company or even know each other

Why use an API (Business Case)

  1. APIs make everything more programmable.

APIs have helped enable the automation boom in the software pipeline and elsewhere in the IT portfolio, for example. And again, in doing so, they reduce manual, repetitive, and often costly effort.

Why use an API (Research Case)

How do I use APIs?

How can you use APIs?

API Types

API Types

What is JSON?

https://www.json.org/json-en.html

JSON (JavaScript Object Notation) is a lightweight data-interchange format.

What is JSON?

#'from:  https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html
library(jsonlite)
json <-
'[
  {"Name" : "Mario", "Age" : 32, "Occupation" : "Plumber"}, 
  {"Name" : "Peach", "Age" : 21, "Occupation" : "Princess"},
  {},
  {"Name" : "Bowser", "Occupation" : "Koopa"}
]'

prettify(json, indent = 4)
## [
##     {
##         "Name": "Mario",
##         "Age": 32,
##         "Occupation": "Plumber"
##     },
##     {
##         "Name": "Peach",
##         "Age": 21,
##         "Occupation": "Princess"
##     },
##     {
## 
##     },
##     {
##         "Name": "Bowser",
##         "Occupation": "Koopa"
##     }
## ]
## 

What are REST APIs?

REST stands for Representational State Transfer.

What are the benefits of REST APIs?

REST APIs offer four main benefits:

  1. Integration
  1. Innovation

What are the benefits of REST APIs?

  1. Expansion
  1. Ease of maintenance

How to use an API?

The steps to implement a new API include:

Where can I find new APIs?

New web APIs can be found on API marketplaces and API directories.

3. Census Mortality API

Open Data

Socrata, acquired by Tyler Technologies in 2018, has an “Open Data API [which] allows you to programmatically access a wealth of open data resources from governments, non-profits, and NGOs around the world.” - Socrata

Socrata

https://socrataapikeys.docs.apiary.io/#introduction/why-use-api-keys?

Wait a second! Authentication is only necessary when accessing datasets that have been marked as private or when making write requests (PUT, POST, and DELETE). For reading datasets that have not been marked as private, simply use an application token.

https://dev.socrata.com/docs/authentication.html

Census API Introduction

Census API Introduction

Census API Introduction

Census API Examples

library(yaml)
library(plyr)
library(dplyr)
library(readr)
library(RSocrata)
library(here)

Census API Examples

 #' will only run if you have an appropriately formatted file,
 #' `credentials/socrata_app_credentials.yml` with valid CDC API credential
socrata_app_credentials <- yaml.load_file(here("credentials/socrata_app_credentials.yml"))

#' Yearly Counts of Deaths by State and Select Causes, 1999-2017
#' https://data.cdc.gov/NCHS/NCHS-Leading-Causes-of-Death-United-States/bi63-dtpu
yearly_deaths_by_state_1999_2017 <- read.socrata(
  "https://data.cdc.gov/resource/bi63-dtpu.json",
  app_token = socrata_app_credentials$app_token,
  email = socrata_app_credentials$email,
  password  = socrata_app_credentials$password
)

Census API Examples

glimpse(yearly_deaths_by_state_1999_2017)
## Rows: 10,868
## Columns: 6
## $ year             <chr> "2017", "2017", "2017", "2017", "2017", "2017", "2017…
## $ X_113_cause_name <chr> "Accidents (unintentional injuries) (V01-X59,Y85-Y86)…
## $ cause_name       <chr> "Unintentional injuries", "Unintentional injuries", "…
## $ state            <chr> "United States", "Alabama", "Alaska", "Arizona", "Ark…
## $ deaths           <chr> "169936", "2703", "436", "4184", "1625", "13840", "30…
## $ aadr             <chr> "49.4", "53.8", "63.7", "56.2", "51.8", "33.2", "53.6…

Census API Examples

https://data.cdc.gov/resource/bi63-dtpu.json

Census API Examples

#' will only run if you have an appropriately formatted file, 
#' `credentials/socrata_app_credentials.yml`, with valid CDC API credentials
source(here("code", "01_get_nchs_mortality.R"))

Census API Examples

library(lubridate)
library(scales)
library(ggplot2)
yearly_deaths_by_state_1999_2022$all_deaths <- as.numeric(yearly_deaths_by_state_1999_2022$all_deaths)
yearly_deaths_by_state_1999_2022$year <- ymd(paste0(yearly_deaths_by_state_1999_2022$year, "01-01"))
end_date <- ymd("2021-01-01")
start_date <- ymd("1999-01-01")
us_deaths_time_series <- ggplot(data = yearly_deaths_by_state_1999_2022 %>% 
               filter(state_name=="United States", year<ymd("2022-01-01"))) +
  geom_point(aes(x = year, y = all_deaths, color="darkred", size=1.5)) + 
  geom_vline(xintercept = ymd("2019-12-01"), linetype="dashed", 
             color = "black", size=1) +  
  scale_y_continuous(labels=comma) +
  scale_x_date("", breaks = date_breaks("2 year"),
               limits = c(start_date, end_date),
               labels = date_format(format = "%Y")) + 
  theme(legend.position="none") + 
  labs(x = "Date", y = "Total Deaths",
       title = "US Total Deaths over Time: 1999-2021") +
  annotate(x=ymd("2019-12-01"),y=+Inf,label="COVID-19",vjust=1,geom="label")

Census API Examples

print(us_deaths_time_series)

ggsave(here("output/us_deaths_time_series.png"),
       us_deaths_time_series, width=10.67, height=6, dpi=120)

4. PurpleAir

PurpleAir Introduction

PurpleAir Introduction

https://www.nbcnews.com/tech/tech-news/high-tech-california-relies-startup-utah-see-how-smoky-its-n1074906

PurpleAir Introduction

PurpleAir Introduction

https://www2.purpleair.com/pages/install

PurpleAir Introduction

https://www2.purpleair.com/community/faq#hc-how-do-purpleair-sensors-work

PurpleAir Introduction

https://www2.purpleair.com/community/faq#hc-how-do-purpleair-sensors-work

PurpleAir Introduction

https://map.purpleair.com/1/mAQI/a10/p604800/cC0#9.72/29.785/-95.3946

PurpleAir API Introduction

#' will only run if you have an appropriately formatted file,
#' `credentials/purpleair_api_credentials.yml`, with valid PurpleAir API credentials
require(httr)

purpleair_api_credentials <- yaml.load_file(here("credentials/purpleair_api_credentials.yml"))

headers = c(
  `X-API-Key` = purpleair_api_credentials$read_key
)

result <- httr::GET(url = 'https://api.purpleair.com/v1/sensors/25999', httr::add_headers(.headers=headers))

PurpleAir API Introduction

result
## Response [https://api.purpleair.com/v1/sensors/25999]
##   Date: 2022-04-04 22:24
##   Status: 200
##   Content-Type: application/json;charset=utf-8
##   Size: 3.48 kB
## {
##   "api_version" : "V1.0.10-0.0.12",
##   "time_stamp" : 1649111052,
##   "data_time_stamp" : 1649110996,
##   "sensor" : {
##     "sensor_index" : 25999,
##     "last_modified" : 1554853637,
##     "date_created" : 1549304400,
##     "last_seen" : 1649110899,
##     "private" : 0,
## ...

PurpleAir API Introduction

names(content(result))
## [1] "api_version"     "time_stamp"      "data_time_stamp" "sensor"
length(names(content(result)$sensor))
## [1] 103
names(content(result)$sensor)[1:20]
##  [1] "sensor_index"     "last_modified"    "date_created"     "last_seen"       
##  [5] "private"          "is_owner"         "name"             "icon"            
##  [9] "location_type"    "model"            "hardware"         "led_brightness"  
## [13] "firmware_version" "rssi"             "uptime"           "pa_latency"      
## [17] "memory"           "position_rating"  "latitude"         "longitude"

PurpleAir API Introduction

PurpleAir API Introduction

PurpleAir API Introduction

content(result)$sensor$stats
## $pm2.5
## [1] 28.7
## 
## $pm2.5_10minute
## [1] 27.8
## 
## $pm2.5_30minute
## [1] 24.5
## 
## $pm2.5_60minute
## [1] 22.1
## 
## $pm2.5_6hour
## [1] 16.2
## 
## $pm2.5_24hour
## [1] 15.9
## 
## $pm2.5_1week
## [1] 10.9
## 
## $time_stamp
## [1] 1649110899

Air Quality Index

https://www.airnow.gov/aqi/aqi-basics/

R’s AirSensor Package

https://github.com/MazamaScience/AirSensor

R’s AirSensor Package

R’s AirSensor Package

The AirSensor package has three data models:

R’s AirSensor Package

Think about the following questions:

What type of AirSensor data do we need to look at which locations have a moderate to unhealthy 30-minute air quality rating in Texas?

What type of AirSensor data do we need to check air quality recorded by a sensor named “Royal Oaks Houston Tx - Outside” between 01-01-2020 and 01-15-2020?

R’s AirSensor Package

Install and load relevant packages:

R’s AirSensor Package

Note that package AirMonitorPlots might not yet be available on CRAN. To install, try the devtools package (see code/install.R)

#' load all the packages
library(PWFSLSmoke)
library(AirSensor)
library(AirMonitorPlots)
library(MazamaSpatialUtils)

Getting the data

set up a local archive folder

The AirSensor package needs to know where processed data will live. For this report, we will specify a local archiveBaseDir where downloaded and processed data will live.

Getting the data

#' assign a name to the new local folder
archiveBaseDir <- here("data", "Australia_on_fire")

#' check if the same-named folder exists
#' if a same-named folder exists, print the warning
#' if no same-named folder, create the folder
if (file.exists(archiveBaseDir)) {
 cat("The folder already exists")
} else {
 dir.create(archiveBaseDir)
}
## The folder already exists
#' set the package base directory to an archive of pre-generated data files 
setArchiveBaseDir(archiveBaseDir)

Load synoptic data for Australia

We will use the pas_createNew() function to create a pas object containing all the spatial metadata associated with purple air monitors in Australia.

To create a new pas object you must first properly initialize the MazamaSpatialUtils package.

#' set package data directory
#' install required spatial data
#' initialize the package

filePath_pas <- file.path(archiveBaseDir, "pas_au.rda")
setSpatialDataDir(archiveBaseDir)
installSpatialData('NaturalEarthAdm1')
installSpatialData("CA_AirBasins")
setSpatialDataDir(archiveBaseDir)
initializeMazamaSpatialUtils() 

Load synoptic data for Australia

#' Download, parse and enhance synoptic data from PurpleAir
#' and return the results as a useful tibble with class pa_synoptic
pas_au <- pas_createNew(countryCodes = "AU", includePWFSL = TRUE)

#' saving and loading the downloaded local file
#' save the synoptic data into an .rda file
save(pas_au, file = here("data", "pas_au.rda"))

#' load data from the .rda file
# pas_au <- get(load(here("data", "pas_au.rda"))) 

Load synoptic data for Australia

A pas object is a dataframe that contains metadata and PM 2.5 averages for many purple air sensors in a designated region. Each pas object can be filtered and edited to retain whichever collection of sensors the analyst desires based on location, state, name, etc.

It is important to note that the data averages in the pas object – the numeric values for PM2.5 or temperature or humidity – are current at the time that the pas is created. pas objects can be used to quickly explore the spatial distribution of PurpleAir sensors and display some then-current values but should not be used for detailed analysis.

Load synoptic data for United States

Create a pas object with current data in US. Save this pas file to your computer for later use.

pas_us <- pas_createNew(countryCodes = "US")
pas_tx <- pas_us %>% pas_filter(stateCode=="TX")

#' saving and loading the downloaded local file
#' load data from the .rda file
save(pas_us, file = here("data", "pas_us.rda"))
save(pas_tx, file = here("data", "pas_tx.rda"))
# pas_us <- get(load(here("data", "pas_us.rda")))
# pas_tx <- get(load(here("data", "pas_tx.rda")))

View a pas map

We can view the locations of each sensor and the AQI (Air Quality Index) maxima (when the pas was created) using the pas_leaflet() function.

pas_leaflet(pas_au)

View a pas map

pas_leaflet(pas_tx)

We can also explore and utilize other PurpleAir sensor data. Check the pas_leaflet() documentation for all supported parameters. By default, pas_leaflet() will map the coordinates of each PurpleAir sensor and the hourly PM2.5 data.

View a pas map

Here is an example of humidity data captured from PurpleAir sensors across the state of New South Wales.

pas_au %>% 
  pas_filter(stateCode == "NS") %>% 
  pas_leaflet(parameter = "humidity")

View a pas map

Use your pas object on US, map the hourly PM2.5 data in Texas

pas_tx %>% 
  pas_leaflet(parameter = "pm25_1hr") 

Load time series data for a sensor

PurpleAir sensor readings are uploaded to the cloud every 120 seconds where they are stored for download and display on the PurpleAir website. After every interval, the synoptic data is refreshed and the outdated synoptic data is then stored in a ThingSpeak database. In order to access the ThingSpeak channel API we must first load the synoptic database.

ThingSpeak API

https://en.wikipedia.org/wiki/ThingSpeak

Load time series data for a sensor

Let’s look at data from multiple sensors in the Sydney area and one from Brisbane (north of Sydney) for the time period covering the Australian wildfires, December 2019 to January 2020.

Load time series data for a sensor

gymea_bay_label <- c("Gymea Bay") #' Gymea Bay, Sydney, AU (southern Sydney)
north_sydney_label <- c("Glen Street, Milson’s Point, NSW, Australia") #' North Sydney, AU
brisbane_6th_ave_label <- c("St Lucia - 6th Ave") #' Brisbane, AU sensor

#' view unique labels in `pas_au` object
# unique(pas_au$label)

pat_gymea_bay <- pat_createNew(
    pas = pas_au, 
    label = gymea_bay_label, 
    startdate = 20191229, 
    enddate = 20200110
  )

save(pat_gymea_bay, file = here("data", "pat_gymea_bay.rda"))

Load time series data for a sensor

pat_north_sydney <- pat_createNew(
    pas = pas_au,
    label = north_sydney_label,
    startdate = 20191229,
    enddate = 20200110
  )

save(pat_north_sydney, file = here("data", "pat_north_sydney.rda"))

Load time series data for a sensor

pat_brisbane_6th_ave <- pat_createNew(
    pas = pas_au, 
    label = brisbane_6th_ave_label, 
    startdate = 20191229, 
    enddate = 20200110
  )

save(pat_brisbane_6th_ave, file = here("data", "pat_brisbane_6th_ave.rda"))

View a pat timeseries

A pat object is a list of two dataframes, one called meta containing spatial metadata associated with the sensor and another called data containing that sensor’s time series data. Each pat object contains time series data for a temperature channel, a humidity channel, A and B PM 2.5 channel’s, and several other fields.

The following chunk demonstrates use of the pat_multiPlot() function to have a quick look at the data contained in a pat object. The plot shows both A and B channels as well as temperature and humidity. The plotting function is flexible and has options for choosing which channels to display, on the same or individual axes. (Type ?pat_multiPlot to learn more.)

pat_multiplot(pat_gymea_bay)

View a pat timeseries

pat_multiplot(pat_north_sydney)

View a pat timeseries

pat_multiplot(pat_brisbane_6th_ave)

View a pat timeseries

View a pat timeseries

start_date <- 20200101
end_date <- 20200115

pat_houston <- pat_createNew(label = "Royal Oaks Houston Tx - Outside",
                     pas = pas_tx,
                     startdate = start_date,
                     enddate = end_date
                     )

pat_houston %>%
  pat_multiPlot(plottype = "all")

#' save the synoptic data into an .rda file
save(pat_houston, file = here("data", "pat_houston.rda")) 

Explore the pas object

For the purposes of this exploratory example, we are focusing on Australia but maybe we want to filter even more and just look at the sensors within a certain radius of the one we chose in Sydney.

lon <- pat_gymea_bay$meta$longitude #' get the longitude of sensor "Gymea Bay"
lat <- pat_gymea_bay$meta$latitude #' get the latitude of sensor "Gymea Bay"

pas_sydney <- 
  pas_au %>%
  #' Filter for PurpleAir sensors 
  #' within a specified distance from specified target coordinates.
  pas_filterNear(
    longitude = lon, 
    latitude = lat, 
    radius = "50 km"
  ) 

Explore the pas object

pas_leaflet(pas_sydney)

Explore the pas object

library(tidygeocoder)

houston_geocode <- geo("Houston, Texas", method = "osm", full_results = TRUE)
houston_geocode

pas_houston <- 
  pas_tx %>%
  #' Filter for PurpleAir sensors 
  #' within a specified distance from specified target coordinates.
  pas_filterNear(
    longitude = , 
    latitude = , 
    radius = 
  ) 

Explore the pat object

Download and plot pat data

We can create pat objects for sensors listed in the pas. Since the fires in Australia started, new PurpleAir sensors have been popping up left and right. Let’s start by grabbing data from sensors with the longest history.

start_date <- 20191210
end_date <- 20200110

pat_chisholm <- pat_createNew(
    label = "Chisholm",
    pas = pas_au,
    startdate = start_date,
    enddate = end_date
  )

pat_moruya <- pat_createNew(
    label = "MORUYA HEADS",
    pas = pas_au,
    startdate = start_date,
    enddate = end_date
  )

pat_windang <- pat_createNew(
    label = "Windang, Ocean Street",
    pas = pas_au,
    startdate = start_date,
    enddate = end_date
  )

Explore the pat object

In order to look for patterns, we can look at the PM2.5 data recorded on channel A from all the sensors. This chunk uses ggplot2 to view all the data on the same axis.

colors <- c("Chisholm" = "#1b9e77", 
            "Moruya" = "#d95f02", 
            "Windang" = "#7570b3")

multisensor_pm25_plot <- ggplot(data = pat_chisholm$data) +
  geom_point(aes(x = pat_chisholm$data$datetime, 
                 y = pat_chisholm$data$pm25_A, 
                 color = "Chisholm"), alpha = 0.5) +
  geom_point(data = pat_moruya$data, 
             aes(x = pat_moruya$data$datetime, 
                 y = pat_moruya$data$pm25_A,
                 color = "Moruya"), alpha = 0.5) +
  geom_point(data = pat_windang$data, 
             aes(x = pat_windang$data$datetime, 
                 y = pat_windang$data$pm25_A, 
                 color = "Windang"), alpha = 0.5) +
  labs(title = "PM 2.5 channel A for multiple sensors" ) +
  xlab("date") +
  ylab("ug/m3") +
  scale_colour_manual(name="Sensor",values=colors) +
  theme(legend.position= c(0.9, 0.8))

Explore the pat object

print(multisensor_pm25_plot)

Explore the pat object

What do you find from the above graph? What other information would you check to confirm your findings?

Let’s check a few other sensors that are closer in proximity to Chisholm to see if they are also reporting abnormal values. Download and plot data on sensors “Bungendore, NSW Australia” and “Downer” together with “Chisholm” for the same time period. Make your plot easy to read. What do you find?

Are there any other factors that may impact the validity of the sensor data we observed?

Explore the pat object

Use the following ids for sensors in Pasadena and Houston to find the associated sensor labels and recreate the analysis done on the Chisholm, Moruya, and Windang sensors

pasadena_ids <- c("98633", "99813")
houston_ids <- c("26659", "133994")

pas_tx %>% 
  filter(ID %in% pasadena_ids | ID %in% houston_ids)
## # A tibble: 4 × 44
##   ID     label                DEVICE_LOCATION… THINGSPEAK_PRIM… THINGSPEAK_PRIM…
##   <chr>  <chr>                <chr>            <chr>            <chr>           
## 1 98633  AAH Meadowlake       outside          1295131          61K7OC25Z3F0CUK8
## 2 99813  AAH Wyne             outside          1303443          EHROTVA5HI2WIY5B
## 3 133994 Eastwood Houston Te… outside          1543032          037RM00T3L4PC0FO
## 4 26659  Rice Military Div    outside          702833           9NT4RUKKX2EG26BH
## # … with 39 more variables: THINGSPEAK_SECONDARY_ID <chr>,
## #   THINGSPEAK_SECONDARY_ID_READ_KEY <chr>, latitude <dbl>, longitude <dbl>,
## #   pm25 <dbl>, lastSeenDate <dttm>, sensorType <chr>, flag_hidden <lgl>,
## #   flag_highValue <lgl>, isOwner <int>, humidity <dbl>, temperature <dbl>,
## #   pressure <dbl>, age <int>, parentID <chr>, flag_attenuation_hardware <lgl>,
## #   Ozone1 <chr>, pm25_current <dbl>, pm25_10min <dbl>, pm25_30min <dbl>,
## #   pm25_1hr <dbl>, pm25_6hr <dbl>, pm25_1day <dbl>, pm25_1week <dbl>, …

Sensor State-of-Health

The pat data quality can degrade over time. For a quick sanity check, we can use the pat_dailySoHIndexPlot() function to plot the daily State-of-Health index. This function plots both channels A and B with a daily State-of-Health index along the bottom.

pat_dailySoHIndexPlot(pat_chisholm)

Can we make the conclusion that the smoke in Sydney area was clearly very bad over our time period of interest?

Plot and evaluate the state-of-health for a Houston area sensor we just looked at.

Working with airsensor data

Plot the daily average air quality

Another way to look at the PurpleAir sensor data is to convert the pat into an airsensor object. The following chunk aggregates data from a pat object into an airsensor object with an hourly time axis.

Because we are dealing with values far outside the norm, we will use the PurpleAirQC_hourly_AB_00 function which performs minimal quality control. See RDocumentation of the package for more details.

#' create an airsensor object
airsensor_chisholm <- pat_createAirSensor(
  pat = pat_chisholm,
  parameter = "pm25",
  FUN = PurpleAirQC_hourly_AB_00
)

airsensor_houston <- pat_createAirSensor(
  pat = pat_houston,
  parameter = "pm25",
  FUN = PurpleAirQC_hourly_AB_00
)

Advanced plotting

Now that we have converted the relative raw pas into an airsensor object, we can use any of the “monitor” plotting functions found in the PWFSLSmoke or AirMonitorPlots packages.

AirMonitorPlots::monitor_ggDailyBarplot(airsensor_chisholm)

Here, the bar associated with each day is colored by Air Quality Index (AQI). Over this time period there were only 10 days where the daily average AQI was below “Unhealthy”.

Incorporating wind data

To get a sense of what direction smoke is coming from, we use the sensor_PolluationRose() function. As the name implies, this function takes an airsensor object as an argument. It then obtains hourly wind direction and speed data from the nearest meteorological site and plots a traditional wind rose plot for wind direction and PM2.5.

In this case, it looks like the smoke is coming mostly from the E/NE which is validated by this wind rose plot from the Australian Bureau of Meteorology here.

sensor_pollutionRose(sensor = airsensor_chisholm)

Incorporating wind data

Check the wind direction of your pat_houston object.

sensor_pollutionRose(sensor = airsensor_houston)

5.Conclusion

In this Intro to API workshop we (hopefully)

6. References